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A3'STRiCT " '.■ - • - s- • V «. 

' ; - Although tethods for esti«ating ite« difficulty 

abundant, little attention has been given ^to the psychological 
f)rpcess.es involved' when a student responds ta a single t^st item 
g^aistery of educational- objectives is not, proven vbea a student. 

.s)ip plies the correct answer to itejis- intended, to test tjiese ; 

'objectives. The stlident«s problem solving meihod lay differ from 
intended by the ^test writer; there is a-:di|ferencre between, the 
student employing the desired process and pf53ltcijig:th.e desired 
product., The useftflneiSs of the Taxonomy, of Bducationalr~Dbjec:feive 
a 5uide for item writing- is guestionedi (BJG) ' •/ - ~~ 
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* • ^' ^ T6iiversi^y of Pennsylvania ^ ^ Ithaca QpileRe 

Tfy . "^^iiAfii. tT^e M;e'a ojfc.^^sycUoiaetrlcs^-Detho^s for estimating itea • . . 

:J ^ ^aimcu^ty^. a1?^ld6^^^ oThe literature is replete vith techni^^ue^ for^ ' \ 

,^^ri| .statistical Jiaethbto of correcting item indicefi^ for various 

f^P^^i^ errprV such .^ guessing dnd partial infomation/ Recoitly, however,^ 
" \ c«i;^in psycholoj^sta mid educators have approached the l)asic issues of 

\ ; .^*em analysis (of which difficmlty estimatioa is a central part) from a 

different point of >iew.^* / I < - . ; 

' ' - .aaedslTjnd j.{i96It) lias argued tkat the Anglo-American traditioa is 
/ "test happy" arid tliat ve have" paid alYnost no attention to the "psjrchological 

* . processes involved in the responses to single test items," CJronbach* (i960) ' / 

pointed out the general lack of concern with the nature of item solution ■ ^ 
processes* ^Thtxs, while there are^many techniques for deriving a numerical 
* . ^ , estimate ^of difficulty, there is a need for experimental investigation of \ 
- certain items are mre difficult l^an others, aMs sort of Informatiop ^/ 

is essential both^-in test-construction and for a complete xmd*^standing of the 
costive conponents ot test-t&ing" behavior, , . / 

Campbell (19^1) proposed two, basic classes of di€ficulty determinants 
; - ' ' which he labelled the- "extremal" arid*''^!internalV factors. Thejforrcr /. . ' 

/ "influencfe the percent -of subjects passing an item and yet are ^o^ reievant 

to the pr6cess(es) that'lAe *item is intended, to tifeasure (p. 90177"*. fbe latter 

class includes those factors which ^ pertain to the process the iteiri is 

intended to measure* . ' - ' ^ * ^ ^ ' ^ ^ 

" . Almost exclusively, the eii5>ha&is in research hds been t^on^the relation- 
^ . "sitip between difficulty, and external factors, such as item position, 

directions, e3tamplep, etc* ^ This is true even though tile vari^ance^ in item 
difficulty indices is a function of bdth "external" and "internal" 

> deterininants* - , ^ . ' . ^ - . ./ 

' One, of the inten^ determiriants proposed by -Car^pbell was 'the ^fect 05 

difficiaty of changes in item comple^^ity. (Thie clinenSion refers to the 
cc?tg^exity of the co;afitive prb.cess tfie item measures rattier than the^typo-*' 
•^grapBi^al "complexity"'^ of the printif^ item , e.g. the number of alternatives, . 
length of .stem) 

TJhil^ the area of item cong^lexity lias Ijeen studied for jaany jrears , • • 

, . (Scates, 1936); but* it is still not well understood. Moye rec^tly however, . 
* Bloom^ et ai, > (195^) formulated the' 'Taxonomy. o^^Educational Objectives 
jys which has offered a method .^pf classifying; the objectives of education and 
thus test items-used to asseSis mst^ry of^thes^ oJ^J'ectives^ ^ > - 
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.^^^ » Bloom hypothesizes that the "cognitive,, demafn" is divided into six'^hroad 
areas: -"knowledge, coinprehensljon, ^plication/ analysis, Sj^the^is and 
evaliiation.*' The '^erarchy'id hypothesized to be one of cunmlative complexity; 
that' is, the hehliviors within any one category 'include all those behaviors , - 
in tiiei theoretically, lesa complex categories.^ In addition, these behaviors 
ar^ only expresserd as thofee that a given "^test item is intended to eli-cit; ' An 
item is; therefore classified, in category i^ if it is intended tp elicit those 
behavioM common to category i^ and ta alirsuT^categories, These authors 
^indicaterd that .any possible relationship between actual and intended behajrlors 
was^within the province of evaluation. This approach to» the^ problem, is "not 
at issue. However, ;ln ^prder to evaluate' performance in this' tramework one 
" must have some dTegree ot assurance by way of empirical relationships that 
^e beTiavior students asployed to solve various test problems were those 
intended by the test writer. xThe^ authors* of the Taxonomy 'however appear to 
have sinqxly equated mastery '«l^b^«c±^yes Tfith supplying the correct answer 
to items, intended to. test thipe objedBives. This implies an identity of 
intended and. actual student Behaviors which should instead be demonstrated 
if these approaches ^re to 1^* useful -in test- construction. The purpose of 
,these^ studies i^ to identify tBe ^BDlution^ strategies students employ when 
answering items tjiat have been classified on the basis of intended strategies 5. 
If one 'cannot demonstrate a relationship ^etweeA actual and intended item 
solution behaviors, ihe^^iB^ iif^he Taxonomy in test construction must Tbe 
carefully reconsidered., ^ * 

METHOfi" -^ ' • 

. Description of Testinfc Materials , * . 

/' #^ 

^ ' ' The items and reading passage empl^oyed in this study were adapted from 
those tased in a study by Kropp et al . , (1966a). These authors constructed ' 
a series of taxonomy-type tests and a^nister^ them to students in a series 

^f Florida high^SChoola.. Althou gh tht^^r t.^a-tn nr\wr^r\ fmir nnr^ -hA h f. flrpftSj 

only the matJerials on glaciers were used In the* present study. 

\From thi iter^s used by Kropp, et al., 50 four-choice multiple choice 
it^ms were chosen tp form four subtests^f approximately 12 items each. As 
in Kropp, et al, 3tudy, only t&e first four levels of the Taxonoiny were 
considered. In addition, Kropp et al. classified an item in category i^ on 
the b,asis of the opinion of experts in item writing and the axonomy, ""(See 
PP^ 76-79 for ^ ful!^ description of their pprocedure.) T}inp, the items were' , 
initially classifieds only on the basis of intended item-answering behaviors. 

- ' ' ' --^ ' ' ^ . ^ . , ' - 

The reading passage was that used by IQ'app, et al . , with slight modi- 
fications. The paS;Salge contained appS^oximately 700 words and' was judged to 
be of'htgh interest value for secondary school students. The vocabulary 
used in the test it^\ and reading passages yas, approximately ninth grade level. 

Subjects . . I ^ ' 

* • K^ ' ' . ^ ^ ' > 

The 71 subjects in^the ^tudy were 11th grade high school students in 
suburban testate Hew Yotk., The ^tud^nts were generally "college-oriented"^ 
and 'above average ih intelligence < 



" Procedure * , 

' -Approxlmateiy one 
-tributes to the ^tude 
and to study in 



A^.the^testi: 
with approximat 
for -hhp student 




eek before testing, the reading materials, were dis- 
they were instructed to read th'e material over 
tion for a 'qtiiz the following week. 



sessjlon, the students received test booklets of 25 pages 
two i*tems on each page. A space was left \mder each item 
record *hls^ methbd of solution immediately after answ^in& 
each item. Th^ students were instructed in doing, this task to try to write 
down why th^ selected the particular answer they did* The instnictions were 
written so 'that they did not suggest reasons to the students. !flie data gener- 
ating procedure in this «tudy was a modified "think aloud" strategy ^ one 
which has been employed as^a vehicle for assessing problem solving styles 
(^oom and Broder, 1950; Miller, Galanter, and Pribram, 196O; Johnson, ^XV6h\ 

The subjects were informed that they were part of a study in which it 
wa3 hoped that their teachers could derive informatiofi which would help them 
in the classroom. The students* cooperation was requested and in the opinion 
of those who viewed the situation, it was generally obtained. 

The receding of the solution^prpcess^s for each item was the* technique ^ 
used to identify the ^^process response*^ for each student. This term -was 
bojrrowed from Kropp et al (1966 b). As they indicated: 

"The choice o^-^tK^ proper response measure is crucial if one wishes^ 
to obtain the best evidence, on which to validate any behavioral measure. 
In the case of the Taxonomy ^ two possible response measiires come immediately 
to mind.* One.ig whether the desired intellectual process is used by the 



student. The other is whether the student gives a coi^rect response to 
an itexE. The former will be referred to as the process response; the 
latter,! the product response (p. 70)." 



Prepea'atidn 



/ 



pf Data and Analysis 



An ordifaal scale was constructed to serve as a set of standards against 
which the recorded solution processes of each student could be judged. 

A level 1 1 response indicated agreement between the actual and intelddd 
solution processes such that the student described his solution method usinff 
behavior specified by' the Taxonoiay or behavior synonymoud| ^rtth those in the 
Taxonomy . 



For exa2rij>le, the items assessing Knowledge objectives involve the ^process 
' of remembering.^ This includes the recall of information aind also the reorgaSi- 
^ "<^ation of the stimuli the item presents in order to provide cues for recedl. 
As the;^Khdvrleclge category is outlined in the Taxonomy, thd material in the 
test items relates specifically to information which has bfeen mjade available 
to the sttident either in lecttire, textbook, or some other communication format* 
Examples of a Level 1 response for a Knowledge item would b'e similar ,to the 
following: "It was in the paragraph you gave us" or "I remembered' it from 
the handout." 
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Preparation of Data arid Analysis 

An ordinal scale' was constructed to s'erve as a set of standards against 
' vhich the recorded solution prdcesses of each student could be judged. 

A level 1 response indicated' agreement between the actual and intended 
solution processes such that the student described his solution method using . 
behavior specified by the Taxonomy or behavior synonymous with those in the 
Taxonomy. . ^ 

For e^emple^ the items assessing Knowledge objectives involve the process, 
of ^rememberin^*^ ' This includes JJae recall of information and also the reorgani- 
zation of the stimuli, the iteja presents in order to provide cues for recall. 
As, the Khbwledge category is outlined in the Taxonomy , the material in^th^ 
test items relates specifically to information which has been made available - 
to the student either in lecture, textbook, or some other communication format. 
Examples of a Level 1 response for a Knowledge item would be similar to the 
following: "It was in the paragraph you gave us" or VI r^embered it from 
the handout." ^ . 

The second order responses involve wl^t is sometimes called "poverty of 
content*" That ip, a general response which indicates something close to 
Level 1 but which lacks a particularly crucial element. For example^ relative 
to a I&iowledge item, a response such as "I thought about it" wquld be a 
Level 2 response. It is close to Level 1 biit ii lacks a specific reference^ 
to the handout materials. ^ A 

^Finally, a third- order response was characterized by vague generalities. 
This rank also included all other responses which indicated that the students 
solved the problem by using a proce^ss other than the intended one. For example, 
this category included responses indicating that an answer to a particular 
itan was simply recalled, wheii that itm Wjas classified in a category other 
than I&owledge. In addition, all responses which indicated that the^ students 
guessed were classified as Level 

It should be noted that it is possible for a student to give a Level 1 
response and still have answered the item incorrectly. That is, a student can 
employ behaviors appropriate to a given Taxoiv>mic level incorrectly . This 
indicates the distinction between an it an which elicits a particular set of 
behaviors -relative to an objective and the student's attainment of that 
objective as indicated by a Correct response. j./ * 

There are in fact foiir possibilities" to be considered.^" The first has been 
indicated. That is, the item elicits the required behaviors but the student 
resj^bnds incorrectly. Secondly, it is possible for an 'item to 'elicit the 
required behaviors and to be answered correctly. In .both of these instances, 
information is obtained relcftive to a student's attainment of a particular 
educationdf' objective. ^ , v 

, The remaining two possibilities dre whej^tetn item does, not elicit the . 
behaviors relevant to a given objective and Me student responds either - 
correctly or incorrectly. VTithin the present framework, these latter tvo ^ i 
instances would indicated by a Level 3 response. Therefore, as part of the 
analysis of actual vs. intended solution processes, all responses must be 
considered; whether the student responded correctly or incorrectly. . 



For each, item a. distribution was made indicating the nuniber of Level 1, 
Level 2,' and Level 3 responses. This distl^ibution was the resist, of the 
classification of the students* written responses by two independent ratersr 
These raters were both given a set of standards indicating exaanples of Level 1 
Level 2 and Level 3 responises for each of the fouy Taxonomic categories con*- 
sidered* These standard were written on the basis of the behavioral . 
descriptions for each level incltided in the Taxonomy^ . ^ ' 

For any of the responses for ^hich the judges disagreed., the -higher of 
th^ two ranks was assigned. Otherwise, the ranks remained as repcttiied bV 
the judges. % - \ i 

After the distribution of rating levels vas obtained for each item, the 
median of the ratings of the students* responses was determined^ If the 
median was less than, or equal to two, this was- taken as a definition of agree- 
*ment between the taxonomic process intended by the item writer and. the " 
'actual process employed by the student^ as indicated by their written ! 
solutions. If the median was greater than ,two, then this item was defined 
as misclapsified since a sufficient corresppndence between actual and intended 
behaviors was not demonstrated. 

' ^ RESULTS ' ^ \ 

Inter judge agreement averaged J6% over all subtests. That is, for all\ 
it^s, the judges agreed on the level of the written solution responses 76* 
per cent of the time, ^cross the four subtests^ from" Knowledge to Analysis, 
the.figures on agreement, were 79^, 87^, 86^, and 58}? respectively. • 



It was anticipated that agreement between intended and actual student 
behaviors would be obtained more often for the IQiawledge items than, for 
any of the others, tfhe reasons for this were /thai the I&iowledge level^ls-- 
discussed iti .greatest detail in the Taxonomy, itTis -easiest to spedjPiT and 
rate .("I rem^bered it from the reading passage."), and item writers have 
had the most practice writing recall items. ' * 

^ The anticipated^jcesults Jwere_obtained^- although they were confined to^ 
only six of the 50 items. For these items, there vas the correspondence 
which would be^ expected on the basis of the use of the a?axonomy. That is, 
for only^ thesjU^' items was the median of the distribution of i^atings of th^ ^ . 
written solution responses less than or equa:|. to two. ^Four of these were 
Knowledge items. , In addition, process agreement was obtained for, only one 
Application item and qje Analysis item. . * 

Tflien this study was originally developed, it was intended that the simRlex 
model be applied to the data. More, specifically, the method of scaling a 
simple, dfevised by Kaiser (1962) would be applied - giving that order of 
subtests most closely forming a scale 'of complexity.^ Then on the basis of 
the ratings of the written solution respons'es, the items would be reclassi- 
fied artd the Kaiser scaling recalcvQated. Since the reclassified, subtests 
woiild be theoretically more homogeneous with respect to the process elicited, 
the intercorrelation matrix of subtest scores choxild then more closely form 
a perfect simplex. . * _ — — 



\ 



The flrsi siinplex scaling wofe carried out coid the "best order" of 
subi;e8ts was I&qwledge, Application, Analysis, Comprehension - an order 
sonuvhat different from that posited in the Taxonomy^ However, the ratings 
of "fihe solution stfa:begi€S did not provide data sufficient for reclassi- 
^ fication* For those kk items Ithere agreement ^between actualj and intended 
strategies vas not round, there was. no consensus as to any other taxonomy 
'strategy. Rathei* a variety of response styles was evident. Students ^ 
reported guessing,' use of previous Imwledge, partial information, and a 
plethora of other strategies, none of which fit the Taxonomy^ s. frame of 
reference • 

Therefore^ by way of summary, the overall result of this study was a ^ 
lack of correspondence between the actual and intended solution processes ^s 
evidenced by the students written' solution strategies. ; 

/ ' - DISCUSSIOIL; - 

^ * To put the findings into some, perspective, it is important to note that 

the items and passage selected were carefullj' written, edited, and judged by ^\ 
experts to be appropriate — a procedure far in excess of ihat available 
to a clgissroom teacher. Nevertheless, it is presumably desirable for the 
item writer to be able to predict the process an item will elicit. 'In fact 
•"for criterion-^referenced 'tests, it is essential. 

As indicated, agreement between actual and intended .proces*&es was not 
demonstrated for a majority of the.it^^. The results of this study call 
into serioiis question the usefulness 6t the Taxonomy in its present form 
as a guide for item writing. If experts^ in psyclioq^trics cannot employ these 
suggestions' as. an eiid in t^st construction, then its v^lue to the teacher 
in classroom testing is certainly questionable. Further research is needed 
to determine those conditions under which the actual-intended discrepancy 
can be reduced so that taxofiomies can be made useful in test constru^^^ion. 

If teachers aqd administrators believe it -is paramoxmt that one assess 
"cognitive processes other than simple recall, then the task for those con- 
structing tests is great indeed. Further studies which •would make ^.t easier 
^^^"^fbr^ students to "think aloud^" involve more extended verbal reports of 
solution strategies, and a varidty of content areas, are anticipated so that 
those working with tests can obtain greater insights into students • probelm 
solving strategies. * ^ * . ' * ^ 

/ 
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